Revealing subnetwork roles using contextual visualization: comparison of 

metabolic networks 
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Abstract 

This article is addressing a recurrent problem in biol- 
ogy: mining newly built large scale networks. Our ap- 
proach consists in comparing these new networks to well 
known ones. The visual backbone of this comparative anal- 
ysis is provided by a network classification hierarchy. This 
method makes sense when dealing with metabolic networks 
since comparison could be done using pathways (clusters). 
Moreover each network models an organism and it exists 
organism classification such as taxonomies. 
Video demonstration: 

\http : //www . labri . fr /per so /bourqui /video . wmv\ 



1 Background and motivation 

Visual mining of large networks is a challenging problem 
in biology since more and more large networks are inferred 
from high-throughput experiments (protein-protein interac- 
tion networks, metabolic networks | 8 ] and gene networks 
1 7 ]). The challenge is to understand the biological functions 
of their different parts. A way to circumvent this problem 
consists in fitting parts of the data onto available knowledge. 
For instance when discovering a new biological network, if 
some elements had already been assigned to a given a func- 
tion then they will probably behave in a similar way in the 
new network. 

Our collaboration with biologists led us to focus on a par- 
ticular biological research topic: metabolism. Metabolism 
is the set of biochemical reactions (figure QJV) that are used 
to perform vital biological functions such as energy genera- 
tion. Each metabolic function is modelled by a set of inter- 
connected reactions corresponding to a small graph called 
a metabolic pathway (figure QJ*) fT3l . Since the output of 
a pathway is often the input of another pathway it is pos- 
sible to merge all these pathways into a single metabolic 
network (figure [Tf!). Each organism has its own metabolic 
network. For instance mammalians and plants won't have 
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Figure 1. The different scales of metabolic 
modeling. First scale a metabolic reaction 
turn a metabolite (in red) into another one un- 
der the action of an enzyme (in green)(A). A 
set of reaction will correspond to a metabolic 
pathway (B) which is a subgraph of the entire 
metabolic network (C). 



the same metabolic network since only plants can generate 
energy using the photosynthesis pathway. But on the other 
hand they will share biological functions, that are pathways. 
When biologists are discovering newly inferred metabolic 
networks, they have to make this kind of comparison. But 
they are dealing with networks containing hundreds of ele- 
ments. Thus, the challenge is to provide a visualization tool 
allowing them to easily mine new metabolic networks by 
comparing them to already known ones. Based on their ob- 
servations they will address the following questions: which 
metabolic functions are shared by these organisms? Is it 
possible to find a metabolic core between different organ- 
isms? 

The next section will describe the task defined in collab- 
oration with biologists and the related visualization chal- 



lenges. Then we will present the data and model used to 
build the visualization that will be described in the last sec- 
tion. 

2 Task and challenges 

Comparative study of biological networks is a powerful 
approach in system biology since it uses available knowl- 
edge to interpret new networks. A first way to compare 
networks consists in looking for topologically similar sub- 
networks. This approach is well suited to understand the 
evolution of organisms. In fact two topologically similar 
parts generally come from a duplication in the genome dur- 
ing the evolution. The comparison of networks is a com- 
putationally difficult problem (see the graph isomorphism 
problem in [6]). Nevertheless heuristics had been proposed 
to align metabolic pathways |10|. But the issue is then to 
scale to the size of metabolic networks since they are tenth 
time larger than pathways. To overcome this problem we 
propose to use the annotations of these networks when they 
are available; for instance by using labelled nodes or group 
of nodes (clusters). It is then easier to identify common 
subparts since the number of cluster is much lower than the 
number of nodes (e.g. boxes on figure Ht). 

In this article we will focus on a particular study case 
which raises two more generic questions: comparing clus- 
tered networks and comparing a set of networks with al- 
ready known ones. In particular we will focus on a set of 
organisms called Alphaproteobacteria. It is a sub-group of 
Proteobacteria which are a major group (phylum) of bacte- 
ria. They include a wide variety of pathogens, such as Es- 
cherichia, Salmonella, Vibrio, Helicobacter, and many other 
notable genera. It exist different kinds of Alphaproteobac- 
teria, in particular we will focus on three of them: Rick- 
ettsiales (pathogen which causes a variety of diseases in hu- 
mans), Caulobacter vibroides (a bacterium essential for the 
carbon cycle) and Agrobacterium tumefaciens (bacterium 
responsible for tumors in plants). These genomes had been 
recently sequenced and consequently new metabolic net- 
works were built. Based on this data, our first aim is 
to help biologists in their understanding of the different 
metabolic properties of each Alphaproteobacteria. More- 
over this analysis will be enhanced by adding a context to 
this comparison. The context will be provided by supple- 
mentary knowledge: metabolic networks of other organ- 
isms (for instance other Proteobacteria). 

A challenge raised by the biological questions that our 
visualization is addressing is the integration of different rep- 
resentation scales: pathway, network, organism. Thus it 
is necessary to draw: metabolic pathways, metabolic net- 
works and a backbone structure connecting them. To draw 
metabolic pathways and networks we are going to use ded- 
icated graph drawing algorithms. The challenge is then to 



embed these drawings in the representation space. Since 
we are dealing with a comparative task we need a structure 
that highlight a logical organization of organisms. A partic- 
ularly well suited structure is the hierarchy since it allows 
abstraction. Indeed in a hierarchical classification each in- 
ternal node models the common information contained in 
all underneath nodes. In our study case each internal node 
contains shared pathways. 

In biology it exists several ways to build a hierarchy: tax- 
onomies (trees), phylogenies (trees) or ontologies (directed 
acyclic graph). For the Alphaproteobacteria task we chose 
to use the taxonomy. But it is important to note that our ap- 
proach is generic enough to allow the use of any other kind 
of hierarchy. 

Finally the main challenge is due to the fact that the data 
we propose to visualize is quite complex since it contains 
metabolic networks of 29 different organisms. These net- 
works are made of 21552 vertices and 27565 edges com- 
posed of 4541 pathways. These pathways share 629 differ- 
ent names . This large dataset also raises a navigation prob- 
lem since it is not possible to visualize all the information 
details simultaneously. 

2.1 Pathway oriented comparison 

Our method relies on the notion of metabolic pathways 
which provides a clustering of metabolic networks. Each 
pathway is associated to a function, thus comparing these 
pathways allows identifying the functional similarities be- 
tween two networks. To do so we will compute the inter- 
section of the set of pathways of two (or more) metabolic 
networks. 

Let Mi, M2 and Mi_2 be three metabolic networks such 
that Mi_2 is the intersection of M\ and Let P\ and P2 
be the set of metabolic pathways of M\ and M2. Each path- 
way p = (V p , E p ) is a subgraph of the network it belongs 
to. We denote name(p) the name of p. Then the set Pi -2 
of metabolic pathways of Mi_2 is defined as follow: 

1. Vp E P1-2, 3p' G Pi and p" E P 2 such that 

name(p) = name(p / ) = name(p // ), and 

2. Ifp = (V P ,E P ) E Pi- 2 ,p' = (Vp>,Ep,) E Pi and 
p" = (Vp»,E p ») E P2 verify the first condition, then 
V p = V P ' H Vp» and E p = E p , n E v „. 

This comparison step generates a new network that sum- 
marize two or more networks (for instance network (a) and 
(b) on Figure [2]). This simplified view of the metabolism 
provides a view on core metabolic functions (Figure [2] (c)). 
As our comparison is based on the name of the metabolic 
pathways, the comparison would be biaised by the differ- 
ent names given to an unique function in several networks. 
Thus, in the data used in this paper, the pathways of all the 




Figure 2. (a) Rickettsia prowazekii metabolic network, (b) Rickettsia typhi metabolic network and (c) the 
intersection of (a) and (b) which corresponds to the typhus group. 



networks to compare are named according the same nam- 
ing rules. The question is then to choose which networks 
we are going to compare. This will be achieved by using a 
classification. 

2.2 Building metabolic network hierarchy 
3 Methodology 
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Figure 4. Tree (a) is the taxonomy, described 
in the NCBI database, for our selected organ- 
isms. Tree (b) is the simplified version of the 
taxonomy. 




Figure 3. Metabolic network comparisons: 
the colors show the order in which the 
metabolic network intersections are com- 
puted from dark blue to yellow. Colored ar- 
rows indicate which networks are needed to 
compute the networks of their targets. 



To enhance and facilitate the comparison of metabolic 
networks, we use a hierarchy DAG (Directed Acyclic 



Graph) H = (Vh,Eh) as a visual backbone. We use a 
DAG because it is more generic than trees. In H, vertices 
having an out degree equal to represent the organisms to 
compare (e.g. leaves in a tree). Then each network associ- 
ated to an internal vertex is the result of the comparison of 
all underneath organisms in the hierarchy. Figure [3] shows 
in which order metabolic networks are computed (from the 
firsts in dark blue to the last in yellow) and colored arrows 
indicate which network are compared. In a more formal 
way we define this process as follows. Let u be a vertex 
of the H and N + (u) be the outgoing neighborhood of u. 
Then the network corresponding to u is the intersection of 
the networks corresponding to all vertices of N + (u). Thus, 
networks of A/" + (u) are needed to compute the network cor- 
responding to u. We also define leaves(u) as the set of 



nodes v such that out degree of v is equal to and there ex- 
ists a path from u to v. Then it is easy to prove that the net- 
work corresponding to u is the intersection of the metabolic 
networks corresponding the vertices of leaves(u). 

According to the task defined in section O we chose to 
use an organism taxonomy tree containing all the organisms 
to compare. This hierarchy is coming from the database of 
the NCBI (National Center for Biotechnology Information) 
database. The resulting taxonomy tree contains more than 
130 vertices and has a depth equal to 30. This tree contains 
very long branches with no ramification, and thus many ver- 
tices don't bring any information (see figure |4] (a)). Since 
these nodes won't be compared to any other one, we sim- 
plify this taxonomy tree by removing them. To do so, we 
forbid sequences of vertices m, U2, Uk-i : Uk of degree 
equal to 2 and with k > 4 by removing all nodes ui with 

3 < i < k — 2. Figured (b) shows the result of this process: 
we obtain a simplified taxonomy tree with 81 vertices and a 
depth equal to 8. 

4 Visualization 

As it was mentioned in section [2] the biological task re- 
quires visualizing the network and the hierarchy. To follow 
biologist representations, mainly inherited from textbooks, 
we carefully chose our drawing algorithms. Moreover, due 
to the large amount of data displayed, we adapted and im- 
plemented navigation methods. 

4.1 Drawings 

Representation of a single metabolic network had been 
intensively investigated in the recent years J9l [21 [141 0. It 
is a challenging graph drawing problem for three reasons 
(upon the ones described in 111]). Firstly because these 
networks contain hundred of metabolic pathways made of 
more than one thousand metabolic reactions. Secondly it 
exists drawing constraints (for hierarchy and cycles) defined 
according to text-book drawings of the pathways. Finally 
because biologists expect to be able to visually identify each 
pathway. For instance in our visualization, clusters in purple 
represent metabolic pathways and clusters in yellow repre- 
sent particular topological structures such as cycles or reac- 
tion cascades. This last point is of utmost importance since 
biologists aim is to discover a new network according to the 
pathway it contains. 

To draw the hierarchy, we use two different algorithms, 
depending on the topology of H . If H is a tree then we use 
a dendrogram representation of the hierarchy (for instance, 
see figure [5] a). Otherwise, we use a modified version of the 
hierarchical algorithm presented in (T). This modification 
consists in laying out all vertices having an out degree equal 
to in the same layer, to easily identify the organisms. 



The task consists in discovering networks in their context 
(the hierarchy). Thus it is necessary to provide a focus plus 
context facility, that is a way to get both closer views and 
context representation. A well suited visualization method 
is the fisheye method [5\. In the next section we describe 
how we adapted the fisheye to modify smoothly the hierar- 
chical representation. 

4.2 Fisheye for hierarchies 

Fisheye method extends the representation around a fo- 
cal point and shrink the other part of the view. It is gener- 
ally based on a function related to the distance to the focal 
point lfT2ll . But applying such a method on graph may cre- 
ate edge and/or node overlapping. To preserve our repre- 
sentation from these artifacts we propose a different way to 
compute the fisheye view. This computation is performed in 
three steps: compute the size of the vertices, apply a draw- 
ing algorithm that takes vertex sizes into account and shift 
the view. 

Size of the vertices We consider that the focus of the user 
is at the same position as the mouse pointer. Thus the size 
of the vertices is related to their distances to the focus. The 
closer a vertex is to the focus the bigger it is. 

Drawing algorithm Once the sizes of vertices are modi- 
fied the hierarchy representation has to be updated. If it is 
done without changing the internal node coordinates it may 
create edge overlapping. To avoid this problem we compute 
a new representation of the hierarchy. If the hierarchy is 
a tree, we use a dendrogram algorithm which needs to go 
through all the m edges and n nodes since it takes into ac- 
count their size. Therefore each update of the display can 
be done in 0(m + n). 

If the hierarchy is a DAG and if we directly use a clas- 
sical DAG drawing algorithm we can reach a complexity of 
0(m x n). This complexity is too high to get a smooth 
display. But taking into account the specificity of our vi- 
sualization we are going to see that it can be computed it 
in 0(m + n). We base our method on the hierarchical al- 
gorithm presented in [lj which works as follow: first, ver- 
tices are assigned to layers, then vertices of each layer are 
ordered to minimize the number of edge crossings, and fi- 
nally coordinates are assigned to each vertex. To update the 
displaying, we just need to recompute the last step of this 
algorithm since layers assigned to the vertices and the order 
in each layer do not change. Trivially layer placement and 
node placement in each layer can be done in 0(m + n). 

Layout shifting As vertex sizes are modified, all the ver- 
tices are getting further from their original position, thus 



Figure 5. Our visualization tool: (a) View on the hierarchy; (b) Drawing of the focused network, here 

Buchnera Aphidicola APS; (c) List Of the names Of all metabolic pathways. Here, Valine biosynthesis is 

selected in (c): in hierarchy, networks highlighted in pink contain that pathway. In (b), compounds 
and reactions of the pathway are highlighted in the focussed network. 



affecting user mental map. Moreover these coordinate mod- 
ifications can be increased by the number of focal point 
changes. To bound the number of moves in the drawing, 
we constraint vertice positions. This is done by translating 
the hierarchy such that the focus vertex position is set to 
its original position. And then, before updating the display, 
the view is shifted such that the distance between the mouse 
pointer and the focus is kept unchanged. Consequently, the 
user mental map is preserved. 

4.3 Navigation through the different 
scales 

Figure [5] shows a screenshot of our visualization tool. 
This tool contains three main widgets. First, a view on the 
hierarchy is shown on figure [3 a. Then in the top right cor- 
ner (see figure 0b), we can see a view on a metabolic net- 
work. And finally, in the bottom right corner there is a list of 
all pathways contained in the organisms to compare. There 
also exists a view at metabolic pathway level since widget 
(b) of figure [5] allows switching from a metabolic network 
view to a metabolic pathway view. 

Therefore, our visualization tool allows navigating from 
the highest level (the hierarchy) to the lowest level (the 
metabolic pathways). 



Highest level: Hierarchy All pathways of all organisms 
to compare are listed in widget (c) of figure Clicking on 
one of these pathways allows focusing on a given pathway. 
Then, in order to know which networks of the hierarchy 
contain the current pathway, they are highlighted in pink. 
For instance, on figure [3 we selected the Valine Biosynthe- 
sis pathway, we can see that the synthesis of this essential 
amino-acid is not present in all organisms: for instance no 
Rickettsiales can synthesize it. On the contrary, Caulobac- 
tervibroides and Agrobacterium tumefaciens, i.e. the others 
Alphaproteobacteria of the set of organisms, can synthesize 
it. It is thus possible to discriminate Rickettsiales from other 
organisms or, using the hierarchy, class of organisms. 

Intermediate level: Metabolic network To focus on a 
particular metabolic network, the user can use a fisheyes dis- 
tortion on the hierarchy (in figure organism in the center 
of the fisheye is Buchnera Aphidicola APS). The vertex u 
laid out the closest to the center of the fisheye is focused. 
Then the metabolic network represented by u is displayed 
in widget (b) in figure 0. 

If a metabolic pathway had been selected and if the fo- 
cused organism contains this pathway, then it is highlighted 
in pink in the metabolic network view. It allows visualizing 
a pathway in its context (in figure the Valine Biosynthesis 




Figure 6. Focus on Valine biosynthesis path- 
way: on the left the corresponding pathway 

in Caulobacter vibwides, and on the right those 

Of Agrobacterium tumefaciens. 



pathway is highlighted in Buchnera Aphidicola). 

Lowest level: Metabolic pathway At the lowest level, 
the user can compare metabolic pathways in different or- 
ganism. We propose this facility since two pathways hav- 
ing the same name may not contain exactly the same com- 
pounds and reactions. Indeed there can be several ways to 
product the same compound. When a pathway is selected, 
double -clicking on an organism containing this pathway al- 
lows to visualize the corresponding pathway of this organ- 
ism. Figure [6] shows metabolic pathways Valine Biosyn- 
thesis in Caulobacter vibwides and Agrobacterium tume- 
faciens. We can see that these two pathways share two re- 
actions but also that two of them are missing in the pathway 
of Agrobacterium tumefaciens (the reactions 1.1.1.86 and 
4.2.1.9). 

5 Conclusion 

In this article we propose a generic way to take advan- 
tage of network clustering when visually comparing net- 
works. Moreover we show that adding a context to the vi- 
sualization helps in understanding the data. In fact, the con- 
text helps to choose the comparison to perform. Moreover it 
provides clues on the biological families and how they share 
metabolic pathways. We present an implementation of this 
method for a particular biology study case. Based on this 
representation, biologists were able to put new biological 
networks into a wider context. 

Since this project was done in collaboration with biolo- 
gists focused on specific networks we were not able to test 
our approach on other data. We plan for instance to use it on 



protein-protein interaction networks. We also plan to adapt 
this method to other domains such as indexed videos (or im- 
ages) collection to facilitate the research of a given video (or 
images). On a visualization point of view we are improving 
the navigation by implementing an algorithm of pathways 
alignment [4), it will highlight the common reactions and 
compounds of several pathways. 
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